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Abstract. This paper presents an extension to Hoare logic for pointer 
program verification. First, the Logic for Partial Function (LPF) used by 
VDM is extended to specify memory access using pointers and memory 
layout of composite types. Then, the concepts of data-retrieve functions ( 
DRF ) and memory-scope functions (MSF) are introduced in this paper. 
People can define DRFs to retrieve abstract values from interconnected 
concrete data objects. The definition of the corresponding MSF of a 
DRF can be derived syntactically from the definition of the DRF. This 
MSF computes the set of memory units accessed when the DRF retrieves 
an abstract value. This memory unit set is called the memory scope of 
the abstract value. Finally, the proof rule of assignment statements in 
Hoare's logic is modified to deal with pointers. The basic idea is that a 
virtual value keeps unmodified as long as no memory unit in its scope is 
over-written. Another proof rule is added for memory allocation state- 
ments. The consequence rule and the rules for control-flow statements 
are slightly modified. They are essentially same as their original version 
in Hoare logic. 

An example is presented to show the efficacy of this logic. We also give 
some heuristics on how to verify pointer programs. 



1 Introduction 

To reasoning the correctness of programs, C.A.R. Hoare presented an axiomatic 
system for specifying and verifying programs [1] [2]. However, this logic can not 
deal with pointer programs because of pointer alias, i.e. many pointers may 
refer to the same location. A few extensions to Hoare logic have been made 
to deal with pointers or shared mutable data structures [3] [1] [5] . Among them, 
separation logic [5] is one of the most successful extensions. That logic uses 
a memory model which consists of two parts: the stack and the heap. Pointers 
can only refer to data objects in the heap. Separation logic extends the predicate 
calculus with the separation operator, which can separate the heap into different 
disjoint parts. Then the Hoare logic is extended with a set of proof rules for 
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heap lookup, heap mutation and variable assignment. Though a few programs 
have been used to demonstrate the potential of local reasoning for scalability [7J, 
verifying programs using separation logic is still very difficult. 

This paper presents an extension to Hoare logic for verification of pointer pro- 
grams. This logic uses an extension of the Logic for Partial Functions (LPF) [H] 
to describe pre- and post-conditions of code fragments. Three type constructors 
are introduced to construct composite types and pointer types used in programs. 
Several kinds of function symbols associated with these types, together with a 
set of proof rules, are introduced to model and specify the memory layout /access 
in pointer programs. 

In this logic, people can define recursive functions to retrieve abstract val- 
ues from interconnected concrete data objects. These functions are called data- 
retrieve functions (DRFs) . DRFs are recursively defined based on basic function 
symbols and the memory access/layout function symbols. For each DRF /, there 
is a memory-scope function 9Jt(/) of which the definition can be constructed syn- 
tactically from the definition of /. If an application of / results in an abstract 
value, then an application of 9Jt(/) to same arguments results in the set of mem- 
ory units accessed during the application of /. During program executions, the 
application of / to same arguments results in same abstract value as long as no 
memory unit in this set is modified. 

In this logic, program specifications are of the form P h p{c}r, where P is 
a set of LPF formulae (usually a set of function definitions), c is a fragment 
of code, and q, r are the pre-condition and post-condition respectively. Such a 
specification means that if all the formulae in P hold for arbitrary program states, 
and c starts its execution from a program state satisfying g; then the state must 
satisfy r when c stops. 

This paper is organized as follows. An extension to LPF is presented in 
Section [2] To model memory access and layout in pointer programs, several 
kinds of new function symbols and constants are introduced into LPF. A set 
of proof rules are introduced to specify these function symbols and constants. 
In Section [3l the concept 'memory scope forms' of terms and 'memory scope 
functions' (MSFs) are introduced. A proof rule is introduced to specify how 
definitions of MSFs can be constructed. A property about memory scope forms 
is also given in this section. The syntax of a small program language is given 
in Section 2J The semantic of this program language is also briefly described 
in this section. The syntax and meaning of program specifications are given in 
Section [5l The extension to Hoare logic is presented in Section [6l The proof rule 
for assignment statements is modified to dealing with the pointer alias problem. 
Another proof rule is introduced for memory allocation statements. Section [Jj 
presents a formal verification of the running example in this paper. Section [H] 
gives some heuristics on program verifications using our logic. Section[5]concludes 
this paper. 

In Appendix \K\ we verify another program which inserts a new node to a 
binary search tree. In Appendix [Bj we use a simplified version of the Schorre- 



Waite algorithm to show that our logic can help people think about program 
verification in different abstract levels. 

1.1 Preliminary of the logic for partial functions 

The logic for partial functions (LPF) used in Vienna Development Method 
(VDM) can reason about undefinedness, (abstract) types, and recursive partial 
function definitions. The syntax of LPF terms and formulae is briefly described 
here. A term of LPF can be one of the following forms: 

1. a variable symbol; 

2. /(ex, . . . , e n ) if / is a function symbol, arity(f) = n and e\, . . . , e„ are terms, 

3. pi e.\ : e%, where p is a formula. 

A formula of LPF can be one of the following forms: 

1. a boolean- typed term, 

2. ©; (© denotes the neither-true-nor-false value. It is originally represented 
by the symbol * in LPF papers, but * is used to denote the memory access 
function in this paper.) 

3. P(ei, . . . , e„), if P is a predicate symbol and arity(P) = n, and e%, . . . , e n 
are terms. In this paper, we view a predicate symbol as a boolean-typed 
function symbol. 

4. ei = 62, where e±, e2 are terms, 

5. e : t, where e is a term and t is a type symbol. 

6. AA, ->A, A\ A A2 are formulae if A, A\,Ai are formulae. 

7. \/x : t ■ A, where x is a variable, t is a type symbol, and A is a formula. 

8. f(xi : T\, . . . , x n : T n ) = e, where e is a term, and all the free variables in t 
are in the set {x%, . . . , x n }. 

For the proof rules, semantics and other detail information of LPF, we refer 
readers to [5]. 

The LPF formulae used in our logic have a constraint: the logical connectives, 
©, A and quantifiers can not occur in a term. Specifically, in a conditional form 
p?ei : e2, p contains no logical connective and quantifier. However, we can use 
some operators like cand, cor, . . ., in terms. These operators can be defined 
using conditional forms. This constraint makes it possible to define the memory 
scope form of terms. 

2 The extension of the logic for partial functions 

In this paper, LPF is extended to deal with issues about memory access/layout, 
composite and pointer program types, data-retrieve functions and memory-scope 
functions. Now we first extend LPF with program types and associated function 
symbols. 



2.1 Program types and associated function symbols 

In LPF, a type can be either a basic type such as integer and boolean, or a type 
constructed using type constructors such as SetOf, SeqOf and Map. However, 
the abstract types constructed using these type constructors can not be used 
directly in imperative programs. To deal with types appeared in programs, we 
introduce three new type constructors into LPF: pointer (P), array (ARR), and 
record (REC). We call the types that can appear in programs as P-types. 

1. integer and boolean are P-types; 

2. Let t, ti, . . . , tk be P-types, m, 712, . . . , njt arc k different names, c is an pos- 
itive integer constant. P(i), ARR(t, c), and REC((ni,ti) x . . . x («fc,tfc)) 
are also P-types. 

We allow a record type t has one or more fields with type P(t) such that we can 
deal with recursive data types in our program language. We use Ptr as the super 
type of all pointer types P(t), where t is a P-type. The abstract type constructors 
Map, SetOf, SeqOf can not be applied to composite program types. However, 
these type constructors can be applied to pointer types to form new abstract 
types. That is, we can get an abstract SetOf(P(t)) for some P-type t, but can 
not get an abstract type SetOf(Rec((ni, t\) x . . . x (rik,tk))- 

The following constant and function symbols associated with P-types are 
introduced. 

1. A program can declare a finite set of program variables with P-types. For 
each program variable v declared with P-type t, &w is a constant with type 
P(t). 

2. For each pointer type t, there is a t-typed constant nil t . The type subscript 
t can be omitted if there is no ambiguity caused. 

3. A partial function * : Ptr — > Ptr U integer U boolean. We write an ap- 
plication of * to e as *e. For a non-nil pointer r with type P(t), where t is 
integer, boolean or a pointer type, *r is a i-typed value. An application of 
this function symbol models a memory unit access. 

4. For each array type t = ARR(i', c), there is a partial function &[] t : P(i) x 
integer — > P(t'). We write an application of such function as &e[i] t instead 
of &[]i(e,z). The type subscripts can be omitted if there is no ambiguity 
caused. These function symbols model the memory layout of array types. 
Intuitively speaking, if e is a non-nil reference to a i-typed data object, &e[i] 
is the reference to the ith element. &e[i] is defined if and only if e ^ nil and 
< i < c. 

5. For each record type t — REC((rai,fi) x ... x (rik,tk)) and a name m 
(1 < i < k), we have a partial function &-+trii : P(t) — > P(^). It is only 
undefined on the constant nil t . We write an application of this function 
symbol to e as &e — >t Uj. The type subscript t can be omitted if there is 
no ambiguity caused. These functions model memory layout of record types. 
Intuitively speaking, if e is a non-nil reference to a record-typed data object, 
he -» t nj is the reference to the field Uj. 



The above function (and constant) symbols can be used in both programs 
and specifications. For conciseness, we use the following abbreviations. 

1. Let v be a program variable declared with type integer, boolean or a 
pointer type, v is an abbreviation for *(&«). 

2. For a program variable v declared with an array type ARR(t,c), and t is 
integer, boolean, or a pointer type, we use v[e] as an abbreviation for 



3. If e is of type P(t), t is a record type of which n is a field name, and the 
field type is integer, boolean or a pointer type, we can use e — > n as an 
abbreviation for *(&e — ► n). 

4. Let v be a program variable declared with a record type of which n is a field 
name, the field type is integer, boolean or a pointer type, we can use v.n 
as an abbreviation for *(&(&«) — > n). 

2.2 The proof rules about memory access and layout 

In this subsection, we present some proof rules to specify memory unit access 
and memory layout of composite types. We define an auxiliary function Block : 
Ptr — > SetOf(Ptr) to denote the set of memory units in a memory block. The 
definition of Block is as follows. 
Block(r) = if r = nil. Otherwise Block(r) = 



Intuitively speaking, Block(r) is the set of memory units in the memory block 
referred by r. 

The rule MEM-ACC says that if r denotes a non-nil pointer referring to a 
memory unit storing basic type values or pointer values, *r denotes a basic type 
value or a pointer value respectively. 



The rule MEM-BLK specifies how memory blocks are allocated. Given two 
arbitrary different memory blocks, they are either disjoint with each other, or 
one is contained by the other. 



*(&(&v)[e]). 




MEM-ACC 



r : P(t) r ^ nil 



t is integer, boolean, or P(t') for some t' 



*r : t 



MEM-BLK 



p : Ptr q : Ptr p ^ q 



Block(p) nBlock(<?) = 0V 
Block(p) C Block(g) V Block(q) C Block(p) 



The following two rules specify how the memory blocks are allocated for de- 
clared program variables. The rule PVAR-1 says that for each program variable, 
a memory block with corresponding type is allocated. Furthermore, this block 
is not a sub-block of any other memory block. The rule PVAR-2 says that each 
program variable is allocated a separate memory block. 



PVAR-1 



kv : P(t) A&ti^ nilA 
Vx : Ptr • Block(&u) £ Block(a:) 



v is a program declared with type t. 



PVAR-2 — — vi,V2 are two different program variables 

(ZV\ f= (ZV2 



The following two rules specify the memory layout for record-typed memory 
blocks. The rule RECORD- 1 says that a record- typed memory block is allocated 
as a whole, i.e. when a record-typed memory block is allocated, all the memory 
blocks for its fields are also allocated. The rule RECORD-2 says that the memory 
blocks allocated for the fields are disjoint with each other. 

r : P(REC(. . . x (n,t) x . . .)) r ^ nil 



RECORD- 1 



(&r -> n : P(i)) A (&r -> n + nil) 



RECORD-2 



r : P(REC(. . . x (m, h) x . . . x (n 2 , t 2 ) x . . .)) r ^ nil 



Block(&r -» m) n Block(&r -> n 2 ) 



The following two rules specify the memory layout for array-typed memory 
blocks. The rule ARR-1 says that an array-typed memory block is allocated as 
a whole, i.e. when an array-typed memory block is allocated, all of the memory 
blocks for its elements are allocated. The rule ARR-2 says that the memory 
blocks allocated for different elements are disjoint with each other. 

: P(ARR(t, c)) r ^ nil < i < c 



ARR-1 



(&r[i] : P(i)) A (&r[i] ^ nil) 



ARR-2 



r : P(ARR(i, c)) r ^ nil < i < c 0<j<c i^j 



Block(&r[i]) n Block(&r[j]) = 



2.3 The interpretation of P-types and new function symbols 

Please be noticed that the types of the constant symbols (&i>, nil f ) introduced 
in this section are integer, boolean, or pointer types. The argument types and 
result types of the function symbols introduced in this section are also integer, 
boolean, and pointer values. So the terms in our logic do not denote array or 
record P-type values. Thus structures for our logic does not have to interpret 
record and array types. 

For each P-type t, (P(t)) A is a countable infinite set in the universal domain 
U A satisfying that (nilp^)' 4 £ (P(t)) A . Furthermore, it is required that for 

different P-types ti and t 2 , {P{ti)) A and (P(t 2 )) A are disjoint. Ptr A is the 
union of all such sets. 

The function symbols &z^n and & [] model the memory layout of records and 
arrays respectively. As we do not go into details about memory layout of com- 
posite types, we just requires that all the proof rules in the previous subsection 
are satisfied by the interpretation of these function symbols. 

The function symbol * models program states. Its interpretation must satisfy 
that * A (x) £ t A if x £ (P(t)) A and x 7^ (nilp^)" 4 , where t is integer, boolean, 

or P(t') for some t'; * A (x) = _L otherwise. 
3 Memory scope functions 

In LPF, a formula f(xi, . . . , x n ) = e defines a function denoted by /. People can 
define data-retrieve functions using such formulae. In definitions for DRFs, we 
require that for any conditional sub-term eo?ei : e 2 of e, none of the function 
symbols occurred in eo is defined (directly or indirectly) based on /. So for each 
DRF definition f(x\, . . . , x n ) = e, e is continuous in /, thus we can use the proof 
rule Func-Ind in [8] to prove properties about DRFs. 

Given an LPF term e, the memory scope form of e, denoted as 9Jl(e), is 
defined as follow. 

1. If e is a variable, 971(e) is 0. 

2. If e is of the form f(e u . . . , e„), 9Jt(e) is 97t(ei)U. . .U3Jt(e n )U9Jt(/)(ei, . . . , e„), 
where 9Jt(/) represents the MSF symbol of /, which is defined as follow 

— If / is a function symbol associated with basic types or abstract types 
(for example, +, — , x ,/.>,<. £, C . . .), Tl(f) is defined as the constant 
0. 

— If / is & — > n, &[], Szv for some program variable, nil t for some type t, 
Wl(f) is defined as the constant 0. 

— If / is the memory access function * introduced in sub-section [2H1 
is defined as 50t(*)(x) = x. 

— For any other function symbols, 9Jl(/) represents a new function symbol 
denoting the memory scope function of /. 

3. If e is of the form e ?ei : e 2 , M(e) is 9Jt(e ) U (e ?9Jl(ei) : 9tt(e 2 )). 



Given a DRF / denned as f(x\, . . . ,x n ) = e, the memory scope function 
97l(/) of / is denned as 



att(/)(*i,..., 



x n ) = 971(e) 



Formally, it is expressed using the following proof rule. 



SCOPE-FUNC 



f(x!, ...,x n ) = e 



Please be noticed that for any sub-term eo?ei : e2 of 9Jt(e), no function symbol 
is recursively defined based on 9Jt(/). 

Definition 1. We say a structure A with signature S conforms to a set of func- 
tion definitions P iff for each definition f(x\, . . . , x n ) = e in P, [/(xi, . . . , x n ) = 
e\ A is T, here a is the assignment of A. 

The structure A conforms to P means that A interprets the defined function 
symbols according to their definition in P. 

In our logic, the function symbol * is used to model program states. The 
DRFs used to retrieve abstract values are defined on *. One of the basic ideas of 
our logic is that the abstract values retrieved by these functions keep unchanged 
if no memory unit in their memory scopes is over-written during a program 
execution. We have the following lemma and theorem about MSFs and memory 
scope forms of terms. 

Lemma 1. Let ¥ be a set of recursive function definitions. Let A and A 1 be 
two structures. They both conform to P and are identical except that they may 
have different interpretations for * and for the function symbols defined in P. 
Let e be a term satisfying that all function symbols in e are either defined in P, 
or associated with basic types, abstract types or P-types. We have that \e\ A = 



[e]f and [m(e)]£ = [M(e)]£' if [m(e)]£ + JL and * A (x) = * A ' (x) for all 
x& \M{e)\ A . 



Proof. By induction, we first prove that the conclusion holds when e contains 
no function symbol defined in P. 

BASE: The conclusion holds if e is a variable or a constant symbol. 
INDUCTION: Assuming the conclusion holds for all terms shorter than e. 



We prove that [ej A = [e]£ and [m(e)]£ = [m(e)]£' if [m(e)]£ + J_ and 
**(x) = * A '(x) for all x £ pJl(e)j A . 



— If e is of the form /(ei, . . . , e n ), here / is a function symbol other than *, 



and / is not defined in P. 971(e) is 9Jt(ei) U . . . U 9Jl(e„) because 9Jt(/) is 0. 
So [9Jt(e;)]£ ^ -1 and * A {x) = * A ' (x) for all x e l^l(e l )Ji for i = 1, . . . , n. 
According to the inductive assumption, \ef\ A — [e,]^ and [9Jt(ej)]^ = 
pJl(ei)l A '. It follows that [e]^ = [e]f and pJt(e)j A = [9Jt(e)]f because 
f A = f A '- 



- If e is of the form *e x . 3Jt(e) is defined as {ei} U SDt(ei). So [0Jt(ei)]£ ^ ± 
and * A (x) = * A (x) for all x £ [9Jt(ei)]„. From the inductive assumption, 
we have that [ei]£ = [ei]^' and [3R(ei)]2 = [9K(ei)]„'. Because [ ei H e 
[DJt(e)]£ if [®t(e)]£ is not j_, we have [e]^ = ^([e^) = * A ' ([e^' ) = 
\e\ A and pR{e)\ A = pJt(e)j A . 

- If e is of the form e ?ei : e 2 . 971(e) is 97l(e ) U (e ?2K(ei) : £0t(e 2 )). From the 
inductive assumption, we have [e ]„ = |e ]£ and [9Jl(e )]^ = [9Jt(e )]«'. 
So [eolf - T iff [co]^ - T. When [e ]£ = [e ]2 = T, we have ptft(e)]2 = 
[9K(eo)]^ U pJlie,)]^ pH(e)]f = [9Jl(e )]f U [9Jl( ei )]f , [e]£ = [e^ 
and [e]„ = [ei]„ . From the inductive assumption, we have that \e\ A = 
[elf and pJl(e)j A = pJl(e)j A ' . We can also prove that [ej A = \e\ A and 
[m(e)]£ = pJl(e)] A ' when [e ]^ is F or N. 

Second, we prove that the conclusion holds if no function symbol defined in 
P is (directly or indirectly) recursively defined on itself. We give a rank to each 
term and each function symbol. The rank of a term e is the highest rank of the 
function symbols occur in e. The ranks of function symbols associated with basic 
types and abstract types are 0. The function symbols *, & — > n, &[] also have 
rank 0. The rank of a function symbol / defined as f(xi, . . . , x n ) = e r in P is 
the rank of e r plus 1. As no function symbol is recursively defined, each function 
symbol and term has a rank. Now, the conclusion is proved by an induction on 
the ranks and the lengthes of terms. 

BASE: According to the conclusion of the first step, this conclusion holds for 
0-rank terms with any length. 

INDUCTION: Let e be a fc-rank term. If the conclusion holds for all terms 
either with a rank less than k, and all fc-rank terms shorter than e. 

— If e is of the form /(ei, . . . , e n ) and / is a function symbol with a rank non- 
greater than k, and defined as f(xi, . . . , x n ) — e r . Then the rank of e r is less 
than or equal to k — 1. As all the function-definition formulae in P are inter- 
preted to T, according to the semantic model of function definitions of LPF, 
both [/(ei, . . . , e n )j A and [/(ei, . . . , e n )\ A ' arc _L if some of [e^ is _L. Oth- 
erwise, [f (ex, e n )\ A and [/(ei, . . . , e n )\ A ' are [e r ]£, and [e r ]5 respec- 
tively, where a' — a{x\ — > [ei]^) . . . (x n — > [e n ]„ ), i.e. a' is same as a except 
that a' maps x t to {e^; [f0t(/)(ei, . . . , e n )\ A and pm(/)(e 1; . . . , e n )\ A ' are 
lTt(e r )} A and [Tl(e r )]$ respectively. Because [9Jl(/)(e 1; . . . , e n )\ A C [e]^, 
we have * A {x) = * A {x) for all x e [0K(/)(ei, . . . , e n )j A = lM(e r )] A . As 
the rank of e r is less than or equal to k — 1, from the inductive assumption, 
we have \e r ] A = [e r ]£ and pJl(e r )j A = [3tt(e r )]£\ i.e. \e\ A = [e]^' and 
[m(f)( ei ,...,e n )j A = pn(f)(e u ...,e n )} A '. So [m{e)]£ = pJt(e)j A ' be- 
cause lM(e t )l A = [<m(ei)]£' and [9H(/)(ei, . . . , e n )\ A = [3R(/)(ei, . . . , e n )j A ' 

— If e is a conditional form with a rank fc, the proof is similar to those of the 
first step. 

Now we are about to prove the general case. For each function symbol / recur- 
sively defined in P, we introduce infinite number of function symbols /o, /i, . . .. 
For the definition of /, i.e. f(x\, . . . ,x n ) = e, we introduce a set of definitions 



fi(xi, . . . , x n ) = ei for i = 1,2,..., where is derived by replacing each func- 
tion symbol g recursively defined in P by gi-i (Here g can be /) in e. We also 
introduce a function definition fo(xi, . .. ,x n ) = © for each fg. Notice that fiS 
are not recursively defined. Furthermore, 9Jt(ej) is same as the term derived by 
replacing g and 971(g) respectively by gi-i and Tt(gi-i) in 971(e). Because it is 
required that for each definition f(x\, . . . , x n ) — e in P, e is continuous in /, 
we have that is less defined than i.e. fi+i(x\, . . . , x n ) — fi(xi, . . . , x n ) if 
fi(xx, . . . , x n ) is defined for any X\,..., x n . Because 971(e) is continuous in 97l(/), 
we have that VJl(fi) A is less defined than 97T(/i+i). So f A is the least upper-bound 
of the function sequence / A , /j\ . . ., and M(f) A is the least upper-bound of the 
function sequence 9Jl(fo) A ,9Jl(fi) A , . . .. Let e be a term containing recursively 
defined function symbols. If [e]„ is not _L, there must be a large-enough integer 
i such that \ei\ A — \e\ A and [97l(ei)]^ = JeJ^, where ei is derived by replacing 
each recursively defined function symbol g by gi-\. As e^ contains no recursively 
defined symbols, according to the second conclusion, we have that this lemma 
holds in general. QED 
□ 

The following theorem [T] gives a sufficient condition under which an LPF 
formula p keeps unchanged before/after some memory units are modified. A 
term occurs in p is called a top-level one if it is not a sub-term of another term 
occurs in p. 

Theorem 1. Let P be a set of recursive function definitions. Let A and A' be 
two structures. They both conform to P and are identical except that they may 
have different interpretations for * and for the function symbols defined in P. 
Let p be an LPF formula satisfying that 

— all function symbols in p are either defined in P, or associated with basic 
types, abstract types, or P-types, and 

— p has no sub-formula of the form f(x\, . . . , x n ) = e r . 

We have that \p\ A = \p\ A ' if [97t(e)]^ 1 and * A {x) = * A ' (x) for all x G 
[97t(e)]^, for each top-level term e of p, and arbitrary assignment a' . 

Proof. This theorem can be proved by an induction on the structure of p. 
BASE: 

— If p is of the form /(ex, . . . ,e„), and / is a boolean-typed function sym- 
bol (or a predicate symbol), p itself is the only top-level term of p. From 
Lemmarjl \p\ A = {p\ A \ 

— If p is of the form ei = e2. From Lemma LTJ \ei\ A = \ei\ A for i = 1,2. So 

\p]i = &[£'■ 

— If p is of the form e : t. From Lemma Q] \e\ A — \e\ A and t A — t A . So 

M2 = bhf- 



INDUCTION: 



— If p is of the form Vx : t-p'. A top-level term of p is also a top-level term of p' . 
From the inductive assumption, for an assignment a(x —> v) for an arbitrary 
t-typed value v, [p'l^ (a ._„) = [p'la(z->v)- According to the interpretation 
rule for Vx : t ■ p' , we conclude that = [p]^ . 

— The conclusion can also be proved when p is of the form Ap', -ip', and pi Ap2- 

QED 

□ 

4 Syntax of programs 

The small program language used in this paper is strong typed. Each expression 
in the programs has a static P-type. An expression e has a static P-type t means 
that at the runtime, either e denotes a value of type t or e is non-denoting. The 
argument types and result types of function symbols appeared in programs are 
definitely specified. The static types of expressions can be decided statically and 
automatically. It also can be statically checked (by a compiler, for example) that 
each function symbol is applied to arguments with suitable static types. In this 
paper, it is supposed that all programs under verification have passed such static 
type check. 

4.1 The syntax of program expressions 

A program expression is an LPF term with following restrictions. 

1. A program expression contains no free variable. Be noticed that a program 
variable v occurs in a term is in fact an abbreviation for *(&«). 

2. Only the following function (predicate) symbols can occur in program ex- 
pressions. 

(a) Constant symbols for basic types (integer, boolean), nil t for type t, 
&w for a program variable v; 

(b) Function symbols associated with integer and boolean, like +,—,*,-=-,< 

(c) Memory access/layout function symbols *, & — > n, &[]; 

(d) Boolean functions not, cand, cor which are defined using conditional 
forms as follows. 

i. not x = x?false : true 

ii. x cand y = -ix?false : y 

iii. x cor y = x?true : y. 

We define these boolean operators because the semantic of logical connectives A 
and V of LPF is different from that of the logical operators commonly used in 
program languages. 



4.2 The syntax of program statements 

The syntax of program statements is as follows. 

st ::= skip | * e\ := 62 \ * e := alloc(i) 
st; st I if (e) st else st 
while (e) st 

This programming language has two kinds of primitive statements: assignment 
statements and memory-allocation statements. 

— An assignment statement *e± := &i first evaluates e\ and e2, then assigns 
the value of ei to the memory unit referred by the value of e\. The values 
stored in other memory units keep unchanged. It is required that *ei and 
has same static type, which is limited to be integer, boolean, or a pointer 
type. 

— A memory-allocation statement *e := alloc(i) allocates a memory block of 
type t, and assigns the reference to this memory block to the memory unit 
referred by the value of e. Furthermore, in the new memory block, all the 
memory units storing pointer values are initialized to nil. It is required that 
the static type of *e is P(i). 

The semantics of the composite statements st; st, if (e) st else si, and 
while (e) st are same as those commonly used in real program languages. It 
is required that in if (e) st else st and while (e) st, the static type of e must 
be boolean. 

Example 1. The program depicted in Figure [1] is a running example used in 
this paper. The type of the program variables k and d is integer. The type of 
program variables root and p is P(T), where T is REC((Z, P(T)) x (r,P(T)) x 
(K, integer) x (D, integer)). This program first searches a binary search tree 
for a node of which the field K equals k. Then it sets the filed D of this node 
to d. Please be noticed that p, root, k, d, p — > K, p — > D, p — > I, p — > r 
are respectively abbreviations for *(&p), *(&root), *(&k), *(&d), *(&p — > K), 
*(&p -» D), *(&p -> 0, *(&P r). 



p:=root; 

while (p -»• K j= k) 
{ 

if (k < p — » if ) p := p — > i else p := p — > r; 

} 

p D := d; 



Fig. 1. The program used as a running example 



5 Syntax of specifications 



A program specification is of the form P h q{c}r, where c is a program, P is a set 
of LPF formulae, q and r are LPF formulae satisfying the following conditions. 

— They contain only function symbols defined in P, the function symbols which 
can occur in program expressions, and the function symbols associated with 
abstract types. 

— q and r contains no sub-formula of the form f(xx, ■ • ■ , x n ) = e. 

The formula set P is called the premise of this specification. P usually contains a 
set of function definitions. The formulae q and r are respectively called the pre- 
condition and post-condition. Intuitively speaking, such a specification means 
that if all the formulae in P hold for arbitrary program states, and the program 
c starts its execution on a state satisfying q, then the state satisfies r when the 
program c stops. 

Example 2. Let P be the set of formulae depicted in Figure [51 These formulae 
define a set of data retrieve functions. The boolean function InHeap is defined in 
sub-section 16.31 InHeap(x) means that x refers to a memory block disjoint with 
all memory blocks for program variables. Let 

q = isHBST(root) A Map(root) = M A k e Dom(root) 

r = isHBST(root) A Map(root) = Mf{k ^ d} 

Prog is the program depicted in Figure [1] The specification P h q{Prog}r says 
that if the program state satisfies the following conditions when {Prog} starts. 

1. The value of root points to the root node of a binary search tree stored in 
the heap; 

2. The tree represents a finite map M from integer to integer; 

3. The value stored in k is in the domain of this map, 

When Prog stops, root still points to the root node of the binary search tree, and 
now the finite map represented by the binary search tree becomes Aft{k i— > d}. 

6 Proof rules of program statements 

In this section, we present the proof rules for program statements. There are 
three rules for primitive statements, one rule for consequences, and three rules 
for control flow statements. 

6.1 The proof rule for skip statement 

The skip statement changes nothing, so we have the following proof rule. 



SKIP-ST 



h g{skip}q 



NodeSet(x) : P(T) -> SetOf(Ptr) 

= (as = nil)? : ({a:} U NodeSet(a; -> I) U NodeSet(:r -> r)) 

Map(:r) : P(T) — > Map integer to integer 

= (as = nil)?0 : {x -> K i-> x -> D}tMap(:r -> Z)fMap(x r) 

MapP(a;,y) : P(T) x P(T) -» Map integer to integer 
= [x = nil)?0 : MapP(a; -> Z)fMapP(a; -» r)t 

((a = y)?0 : {z -> K ^ x -> D}) 

Dom(i) : P(T) -> SetOf (integer) 

= (a = nil)?0 : ({as -» if} U Dom(x -» i) U Dom(i -> r)) 

isHBST(x) : P(T) -> boolean 

= (as = nil)?true : InHeap(x-) A isHBST(a? — > i) A isHBST(:r -» r)A 
(Dom(a; -»■£) = 0?true : MAX(Dom(as — > /)) <C x — ► JQA 
(Dom(a; -» r) = 0?true : a; -> if < MIN(Dom(:E -> r))) 



Fig. 2. The definitions of a set of data retrieve functions 



6.2 The proof rule for assignment statements 

Let q be an LPF formula and x be the only free variable in q. Let t be the static 
type of *ei and e 2 . The type t must be integer, boolean, or P(t') for some t'. 
We have the following proof rule for assignment statements. 



ASSIGN-ST 



P, q[e 2 /x] h ei ^ nil Ae^ 9Jt(ei) Ae 2 :t 

P, <7[e2/x] h ei ^ £DT(e)[e2/x] for each top-level term e of g 



h g[e2/a;]{*ei := e 2 }q[*ei/x] 



Here, it is required that all bounded variables in q are different from x. A term 
e of q is called a top-level one if it is not a sub-term of another term of q. 
Furthermore, it is required that for each conditional term eo?ei : e 2 of q, eo is a 
boolean-typed term, so we can construct a memory form of each top-level term 
of q. 

Now we briefly prove the soundness of this rule. We can use two structure 
A and A' to denote the program states before/after the assignment statement. 
A and A' are only different in the interpretations of the function symbol * and 
the symbols defined in P. The semantic of an assignment *e% — e 2 is as fol- 
low. It first evaluates the value of ei and e 2 , i.e. \e\\ A and [ea]„, then the 
content of the memory unit referred by \e\\ A is set to [e2]^- Formally, we 
say * A '([ei]^) = [e 2 ]^, and * A (x) = * A (x) for all x [ei]£. According to 
Lemma [TJ the condition e% £ SDT(ei) assures that \e\\ A = \e\\ A , so \e 2 \ A — 
* A '(I e i]a) = ^'([eila') = I* e iJa'- Tne condition ei ^ nil A e 2 : t assures that 



both [*ei]„ and are not _L. Together with these conditions, the condition 

e\ 971(e) [e% /x] assures that for each top-level term e, [ei]^ ^ J9Jl(e)[e2/a;]]Q , 
which equals to [®t(e)]f (x ^j e2 jA). From Lemma HJ we have [e]f (2 ,^j e2 ]A) = 

\4a{x^le 2 \i) = I e la(x^[* ei ]£')" S ° W<3 haVe I e [ e 2/ X ]la = W e ll x ]\a ■ As a is 

arbitrary, according to Theorem [TJ [g^/a;]]^ = [g[*ei/x]]^ ■ So we conclude 
that if qfa/x] holds before the assignment statement, q[*e\/x] holds after. 

6.3 The proof rule for memory allocation statements 

The memory allocation statement *e = alloc(£) first evaluates e, then allocates 
an unused memory block and assigns the reference to this block to the memory 
unit referred by e. All the memory units storing pointer values are initialized 
to nil. This block can not be referred by any pointers stored somewhere before 
this allocation. Furthermore, this block is disjoint with all of the memory blocks 
allocated for program variables. It is required that the static type of *e must be 
P(t). Let p be an LPF formula containing no free variable, we have the following 
proof rule for memory allocation statements. 



ALLOC-ST 



'Aghe^nilAeg' 971(e) 

1 A q h e g" SSJt(e') for each top-level term e' of q 



PhoUe- allocftU ( 9 A InHea P(* e ) A Uni <l ue ( e ) 
V h q {*e - alloc(tj> ^ AptrInit(j)ce) A ^ e ^ nil) 

The predicts Unique, InHeap, and Ptrlnit are defined as follows. 

Unique(x) = Vy : Ptr-((y ^ xAy ^ nilA*y : Ptr) Block(*?/)nBlock(*a;) = 0) 

InHeap(p) = A (Block(&u) n Block(p) = 0) 

x is a program variable. 

Ptrlnit(p) = Vx : Ptr • ((a; G Block(p) A x ^ nil A *x : Ptr) => *x = nil) 

Intuitively speaking, Unique(p) says that the memory block referred by the ref- 
erence stored in p can not be accessed by references stored elsewhere. InHeap(p) 
says that the memory block referred by p is disjoint with all the memory blocks 
for program variables. Ptrlnit(p) says that all memory units with pointer types 
in the memory block referred by p store nil pointers. 

Similarly to the soundness reasoning for the rule ASSIGN-ST, we can con- 
clude that q still holds after the allocation statement if it holds before. Because 
the allocated memory block is unused, it can not be accessed by any point- 
ers stored somewhere before this memory allocation. This allocation statement 
assigns the reference to this block only to the memory unit referred by e. So 
Unique(e) holds after this allocation statement. The new allocated block is dis- 
joint with any blocks for program variables. So InHeap(*e) holds after this allo- 
cation statement. The post condition Ptrlnit(*e) holds because the new block 
is initialized as described above. So we conclude that this proof rule is sound. 



6.4 The consequence rule and the rules for control flow statements 



The following proof rules are essentially the same as those presented in [T]. 
The consequence rule is slightly modified such that the premise of a verified 
assertion can be strengthened. The rules for if -statement and while-statement 
are modified such that the pre-condition ensures that the condition expression 
e is evaluated to cither T or F. 



CONSEQ 



h q{s}r 



H 



»,?'hg P, r h r' 



h q'{s}r' 



SEQ-ST 



h<?{si}r Ph r{s 2 }r' 



h q{s 1 ;s 2 }r' 



IF-ST 



I ,gr-eV-.e P h (q A e){si}r P h (q A ->e){s 2 }r 



h g{ if (e) Si else s 2 }r 



WHILE-ST 



',gheVne Ph(gAe){s}« 



h g{ while (e) s }q A 



7 Verifying the running example 



In this section, we verify the program depicted in Figure [TJ 



7.1 The DRFs, MSFs and their properties. 

Example 3. Figure [2] shows the data-retrieve functions for specifying and veri- 
fying the program depicted in Figure [1] From the proof rule SCOPE-FUNC in 
Section [H we can derive the definitions of all corresponding MSFs. The defini- 
tions of MSFs depicted in Figure [3] are simplified but equivalent to those derived 
directly by the rule SCOPE-FUNC. For conciseness, we write 9Jt(NodeSet) as 
NS m , 9Jl(Map) as MP m , 9Jl(MapP) as MPP m , $H(Dom) as DM m , an(isHBST) as 
HBST m . Some properties about these DRFs and MSFs are depicted in Figure |H 
These properties can be proved in the extended LPF. 



NS m (:r) = (;r = nil)? : ({kx -> I, kx -> r} U NS m (a; ->l)U US m (x -> r)) 

MP m (x) = (x = nil)?0 : 

{&£ -» /, &:r -> r, &a; — > D, &a; — > if} U MP m (x Z) U MP m (a; -> r) 

MPP m (:r,y) = (x = nil)?0 : 

{&x -» /, &x r} U MPP m (a; -> I) U MPP m (:r -» r)U 
((x = y)?0 : {kx -^K,kx^ D}) 

m m (x) = (x = nil)?0 : ({kx -> if, &x -> J, &:r -> r} U DM m (x -> /) U DM m (x -> r)) 

HBST m (x) = (a; = nil)?0 : {kx -> Z, &a; -> r} U HBST m (x -> Z) U HBST m (x -> r)U 

DM m (x -c Z) U (Dom(x -> I) = 0?0 : {&£ -> 1} U {&x -> U DM m (x -> Z))U 
DM m (x -> r) U (Dom(x -tr) = 0?0 : {&x -> r} U {kx K} U DM m (:r -> r))) 



Fig. 3. The definitons of MSFs 



P, isHBST(a;) h kp & HSBT m (x) U MP m (x) U DM m (x) (1) 

P, isHBST(x) h &p -> L> £ HSBT m (x) U MPP m (x, p) U DM m (x) (2) 

P, isHBST(x), y G Dom(x),y < x -> K h y <E Dom(i ->■ Z) (3) 

P, isHBST(x), y £ Dom(x), y > x ^ K \- y £ Dom(x -f r) (4) 
P, isHBST(x), y £ NodeSet(x) h Map(x) = MapP(x, y)\{y -> K h-> y -> D} (5) 

P, NodeSet(x) : SetOf(Ptr) hi£ NodeSet(x) (6) 
Fig. 4. Some properties about DRFs and MSFs 



7.2 Verifying the program 

In this section, we will prove that if root points to a binary search tree, and we 
view this binary tree as a finite map, and k is in the domain of this map, the 
program depicted in Figure [T] set the co- value of k to d. In this section, we use 
P to denote the set of the function definitions in Figure [21 The specification is 
as follow. 

P h PRE-COND {Prog} isHBST(root) A Map(root) = M\{k i-> d} 

Here, PRE-CON is the abbreviation for isHBST(root) A Map(root) = M A k e 
Dom(root), M is a constant with type Map integer to integer. The verification 
steps are given below. 

From ASSIGN-ST,[T]and &p £ {&root,&k}: 

/ (PRE-COND Axe NodeSet(root) A k 6 Dom(a;))[root/x] \ 
P h {p = root; } (7) 

\ (PRE-COND Axe NodeSet(root) A k e Dom(x))[p/:r] / 

From A-1,0 CONSEQ,El 

P h PRE-COND {p = root; } PRE-COND Ape NodeSet(root) A k e Dom(p) (8) 

From ASSIGN-ST, [TJ and &p g {feroot, &k}: 

/ (PRE-COND A x e NodeSet(root) A k e Dom(.x))[p -> l/x]" 
Ph {p:=p^/;} j (9) 

V (PRE-COND Axe NodeSet(root) A k e Dom(x))[p/x] 



From[3l substitution: 



PRE-COND, p e NodeSet(root), k e Dom(p), k < p -» K h 
p -> I e NodeSet(root) A k e Dom(p I) 



(10) 



From IH1Q3I1 and CONSEQ: 



PRE-COND Ape NodeSet(root) A k e Dom(p) A k < p -> K \ 

{p:=p^Z;} (11) 
PRE-COND Ape NodeSet(root) A k e Dom(p) / 

Similarly, we can prove: 

' PRE-COND Ape NodeSet(root) A k e Dom(p) A k > p -> K \ 

{p:=p^r;} (12) 
PRE-COND Ape NodeSet(root) A k e Dom(p) J 

k e Dom(p) implies p ^ nil, thus k<p^ifVk>p^is:. From IF-ST, rjU [TJ 
and : 

' PRE-COND Ape NodeSet(root) A k e Dom(p) A p ->■ K ^ k N 

{if (k < p -> K) p := p -> Z; else p := p -> r; } j (13) 

PRE-COND Ape NodeSet(root) A k e Dom(p) 



k G Dom(p) implies that p ^ nil, thus p-»if^kVp->if=k. From WHILE- 

' PRE-COND A p G NodeSet(root) A k G Dom(p) 

{the while statement} ! (14) 

PRE-COND A p G NodeSet(root) A k G Dom(p) A p -> K = 



From [8] and the properties of finite map: 



, isHBST(x), p G NodeSet(a;), k = p -> K h 
TMapP(a;, p)f{p -> K ^ y} = Map(ir)t{k y] 



(15) 



FromfTH substitution: 



(16) 



P, PRE-COND, p G NodeSet(root), k = p -v KV 
isHBST(root) A MapP(root, p)f{p -»■ K i-> d} = M\{k i-> d}) 

From the rule ASSIGN-ST, M and &p -> L> £ {&root, &p, &p -> X, &k, &d}: 

'(isHBST(root) A MapP(root, p)f{p -> if x} = Mf{k h-» d})[d/x] 
{p^#:=d} 

(isHBST(root) A MapP(root, p)f{p K ^ x} = Mf{k d})[p -> D/x] 

From the rule CONSEQ, QU Q7| 

/ PRE-COND A p G NodeSet(root) A k G Dom(p) A p 
Ph {p^D:=d} 

\isHBST(root) A Map(root) = M]{k i-» d} 

From the rule SEQ-ST, HHHHH 

P h PRE-COND {Prog} isHBST(root) A Map(root) = Mf{k d} (19) 




8 Heuristics: virtual variables and pragmatic meaning of 
program statements 

Generally speaking, a pointer program may create unbounded number of data 
objects during its execution. These data objects usually interconnected through 
pointers. They are usually used to represent abstract values which can be re- 
trieved using recursively defined DRFs. We can view a set of interconnected 
data objects as a virtual variable, which holds an abstract value retrieved using 
a DRF. Usually, such a data object set maintains a set of structural properties 
during the program execution. These properties can also be expressed using a 
set of boolean-typed DRFs. As we did in our running example, a DRF isHBST 
is used to state that a set of data objects form a binary search tree, while the 
DRF Map is used to retrieve a finite map from this binary search tree. 

Usually, assigning new values to such a virtual variable is performed by a 
group of program statements. These statements change the values stored in a 



few number of the data objects, thus change the abstract value 'stored' in the 
virtual variable. As to the structural properties, either none of the statements 
changes their values, or some statements change their values, but some other 
statements restore them afterwards. To reasoning the effect of these statements 
on the abstract value, we can define some auxiliary data-retrieve functions. 

— These auxiliary DRFs do not accessed the memory units modified by these 
statements. So the abstract values retrieved by these auxiliary DRFs keep 
unchanged. 

— The relationship between the abstract value retrieved by the main DRFs, 
those retrieved by auxiliary DRFs, and the values stored in the modified 
memory units can be proved based on the definitions of the DRFs. For 
example, the property 

P, isHBST(a;), y e NodeSet(a;) h Map(x) = MapP(a;, y)\{y -> K ^ y -> D) 

shows the relation between the main DRF Map, the auxiliary DRF MapP, 
and the values stored in &y — > K and &cy — > D. 

— The values retrieved by auxiliary DRFs keep unchanged. The effect of these 
statements on the modified memory units can be relatively easily derived. 
So, the effect of these statements on the abstract value retrieved by main 
DRF can be reason based on the relations between main DRFs, auxiliary 
DRFs and the value stored in modified memory units. 

To specify and verify these statements, we should understand and reason these 
statements as a whole, as these statements work together to assign a new value 
to a virtual variable. Understanding the effects of such statement groups can 
help us understand the whole program abstractly. In the appendix [B] we briefly 
describe such an example. We say the effect of a group of program statements on 
a virtual variable as the pragmatic meaning of these statements. Understanding 
and verifying the pragmatic meanings of small statement groups first, then we 
can verify code with larger size step by step. 

9 Conclusion and future works 

In this paper, we present an extension of Hoare logic for verification of pointer 
programs. The pre-conditions and post-conditions are formulae of an extended 
version of the LPF logic, which can deal with undefinedness, recursive function 
definitions, and types. Program types and function symbols (*, and &[]) 

associated with these types are introduced to model memory unit access and 
memory layout for composite types. A set of proof rules are introduced to specify 
these function symbols. Using these functions, people can deal with high-level 
program types (record, array) directly. 

People can define recursive functions to retrieve abstract values from concrete 
interconnected data objects. We call these functions as data-retrieve functions 
(DRFs). Such functions can also be defined to specify the properties of data 



structures. For each data-retrieve function /, we can derive the definition of its 
corresponding memory-scope function (MSF) syntactically. When an abstract 
value is retrieved by applying / to a set of arguments, applying the MSF of 
/ to same arguments results in a set of memory units accessed during the re- 
trievement. As long as no memory unit in this set is modified during program 
executions, applying / to same arguments results in same abstract value. 

We present a new proof rule for assignment statements, and another rule for 
memory allocation statements. The proof rule for assignment statements says 
that after the assignment, the memory unit referred by the left-hand stores the 
value of the right-hand computed before the assignment. It also says that the 
abstract values keep unchanged if the memory unit referred by the left-hand is 
not in their memory scopes. The proof rule for memory allocation says that after 
the allocation, the memory unit referred by the left-hand stores a reference to a 
newly allocated memory block. 

This logic has the following advantages. 

— This logic is easy to learn. Most of the knowledge encoded in this logic 
have been (explicitly or implicitly) taught in undergraduate CS courses. 
For examples, the concept of recursive functions and first order logic are 
already taught in undergraduate CS courses. The proof rules about program 
variables, *, &— >n, &[] are taught informally in the undergraduate courses 
about programming languages and compilers. 

— This logic supports reuse of proofs. Most of the proved properties of DRFs are 
about data structures. They are independent of the code under verification. 
So these properties can be reused in verification of other code using same 
data structures. It is possible to build a library of pre-defined DRFs, MSFs, 
and their properties. 

— Verification can be performed on different abstract levels. A group of state- 
ments change the abstract value represented by a set of interconnected data 
objects, but keep the structural properties of these data objects. People can 
first understand the pragmatic meaning of these statements, i.e. the effect of 
these statements on the relevant abstract values. Then, they may view these 
data objects as a virtual variable, and the statements as an abstract state- 
ment assigning new value to this virtual variable. Thus, people can reasoning 
the program at a more abstract level. 

— Make use of the research results on pointer analysis. Many of the premises 
when applying proof rules can be proved automatically by pointer analy- 
sis. For example, for all assignment statements of the form = e, the 
premise that &w ^ nil can be proved by pointer analyer easily. For assign- 
ment statements of the form *p — e, the premise p ^ nil of the proof rule 
ASSIGN-ST can also be verified automatically in many cases. 

In the future, we will extended our logic to deal with more programming 
language concepts: function calls, function pointers, class/object, generics, . . .. 
At the mean time, we will try to build a library of pre-defined DRFs, MSFs, and 
their properties for frequently used data structures. 
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A Another example: inserting a node to a binary search 
tree 

Example 4- The program depicted in Figure add a new tuple (k,d) into the 
map represented by a binary search tree. The types of the program variables k 
and d are both integer. The type of program variables rt and tmp are P(T), 
where T is REC((7, P(T)) x (r, P(T)) x (K, integer) x (D, integer)). The type 
of p is P(P(T)). 

The DRFs depicted in Figure [HI are used in the specification and verification 
of the program depicted in Figure O If *x points to the root node of a binary 
search tree, and y is the address of a child-field of a node of this tree. The 
DRF PNodeSet(a;, y) retrieve the set of the children-pointer-field addresses (i.e. 
addresses of the fields I and r) of all the nodes in the binary search tree derived 
by setting *y to nil. The argument x is also in this set. MapPP(x, y) retrieve the 
map represented by this modified binary search tree. 

The boolean-typed DRF isHBSTK(a;, y) says that if we make *y point to a 
newly allocated node {nil, nil, k, d}, *x is still the root node of a binary search 
tree. The DRF DomK(a;, y) retrieve the keys stored in this tree. 



p:=&rt; 

while (*p! = nil) 
{ 

if (k < p — > K ) p := &(*p) — » / else p := &(*p) — > r; 

} 

tmp = alloc(T); 

tmp — > K := k; tmp — ► Z) := d; 

*p=tmp; 



Fig. 5. Another program 



The (simplified) definitions of the corresponding MSFs are depicted in Fig- 
ure El We use DMK m , STK m , MPPP m as OT(DomK), 9Jt(isSTK), and 9Jt(MapPP) 
respectively. Let P' be the set of function definitions depicted in Figure [2] and 
Figure [6] Some of the properties about the DRFs in P' and corresponding MSFs 
are depicted in Figure [H Some of the DRFs and MSFs in Section together 
with their properties, are reused in this verification. 



DomK(x,j/) : P(P(T)) x P(P(T)) -> SetOf(integer) 

= (x = y) ? {k} : 

(*x = nil) ? : {(*x) -> K} U DomK(&(«) ->l,y)U DomK(&(*i) -> r,y) 

isHBSTK(x,?/) : P(P(T)) x P(P(T)) -» boolean 
= (» = y) ? TRUE : 
(*ai = nil) ? TRUE : 

InHeap(*a;) A isHBSTK(&(*a;) — > Z, y) A isHBSTK(&(*a;) -> r,y)A 
(DomK(&(*x) -» Z) = 07TRUE : MAX(DomK(&(*:r) -» Z)) < (*x) -> if)A 
(DomK(&(«) -fr) = 07TRUE : (*x) -> K < MIN(DomK(&(*a;) -> r))) 

MapPP(x,y) : P(P(T)) x P(P(T)) -» Map integer to integer) 

4 ( X = y) ? : 
(*as = nil) ? : 

MapPP(&(*2:) -» Z,y)tMapPP(&(*a;) -> r,y)t{(*a;) -» if -> D} 

PNodeSet(i, y) : P(P(T)) x P(P(T)) -> SetOf(Ptr) 

= {a;} U ((a; = y)?0 : (*x = nil) ? : (PNodeSet(&(*a;) -> Z, y) U PNodeSet(&(*x) -» r, y)))) 



Fig. 6. DRFs for specifying and verifying the program in Figure 



Um m (x,y) : P(P(T)) x P(P(T)) -> SetOf(Ptr) 
4 (x = y) ?{&k} 

({x} U (*x = nil)?0 : {x,k(*x) -> A"} U DMK m (&(*a;) -» /, y) U DMK m (&(*x) -> r,y)) 

HBSTK m (a;, y) : P(P(T)) x P(P(T)) -> SetOf(Ptr) 

±(x = y) ? : 

{a;} U (*x = nil)? : HBSTK m (&(*:r) -> I, y) U HBSTK m (&(*£■) -> r,y)U 

DMK m (&(*:r) ->l)U (DomK(&(«) -*•/) = 0?0 : {&(«) -> K})U 
DMK m (&(*:r) -> r) U (DomK(&(«) -tr) = 0?0 : -> AT}) 

MPPP m (:r,y) : P(P(T)) x P(P(T)) -> SetOf(Ptr) 

4 (* = y)?0 : 

{a;} U (*x = nil) ?0 : 

MPPP m (&(*a;) -> l,y) U MPPP m (&(*a;) -s-r,y) U {k(*x) -> if,&(*x-) -> D} 

PNS m (x,y) : P(P(T)) x P(P(T)) -> SetOf(Ptr) 

= (a; = y)?0 : {a;} U (*x = nil)?0 : (PNS m (&(*a;) -> i, y) U PNS m (&(*«) -» r,y)) 



Fig. 7. MSFs of the DRFs in Figure© 



P', isHBST(*x), isHBSTK(:r, y) h y DMK m (a;, y)UHBSTK m (x, y)UMPPP m (a;, y)UPNS m (:r, y) 

(20) 



', isHBST(*x), y £ PNodeSet(:r, y), *y / nil h *y £ NodeSet(*x) (21) 

(22) 



?', isHBST(*a;), y G PNodeSet(x, y) A isHBSTK(a-, y) A k < (*y) K \- 
&(*y) -» i £ PNodeSet(x,&(*y) — ► Z) A isHBSTK(a-, &(*y) -> I) 



?', isHBST(*sj), y e PNodeSet(x, y) A isHBSTK(a;, y) A k > (*y) -»lfh 
&(*y) -»r£ PNodeSet(s, &(*y) -» r) A isHBSTK(x, &(*y) -f r) 



(23) 



P'.isHBSTK^y) y £ PNodeSet(a :,y) l, inHeap(*y), isHBS1>a;) (24) 

(*y) -> if = k A (*y) — > I = nil A (*y) -» r = ml y J y ' 

I ',isHBST(*a:),y G PNodeSet(a;, y) h Map(*x) = MapPP(x, y)tMap(*y) (25) 

P',isHBST(*a;) h &p DMK m (z, y) U HBSTK m (a;, y) U MPFP m (x, y) (26) 

P', isHBST(*x) h &tmp £ DMK m (x,y) U HBSTK m (x, y) UMPPP m (a;,y) (27) 
Fig. 8. Some properties about the DRFs and MSFs 



We use PRE-COND as the abbreviation for isHBST(rt) A k £ Dom(rt) A 
Map(rt) = Mg. The specification of this program is 

PRE-COND{The Program}isHBST(rt) A Map(rt) = M f{k i-> d} 

The sketch of the proof is as follows. The common premise of these assertions is 
P', which is omitted for conciseness. 

From the rule ASSIGN-ST, CD [HJ and &p £ {&rt,&k}, we get following two 
assertions: 

(PRE-COND A isHBSTK(&rt, x) A x G PNodeSet(&rt, x))[&rt/x] 

{p = &rt; } (28) 
(PRE-COND A isHBSTK(&rt, x) A x G PNodeSet(&rt, x))[p/x] 



(PRE-COND A (x e PNodeSet(&rt, x)) A isHBSTK(&rt, x))[&(*p) -► r/z] 

{p = &(*p)^r} (29) 
(PRE-COND A (x 6 PNodeSet(&rt, x)) A isHBSTK(&rt, x))[p/x] 

From the rule CONSEQUENCE, EH and [Ml 



PRE-COND A (p G PNodeSet(&rt, p)) A isHBSTK(&rt, p) A *p ^ nilA 
(k > *y — > if) 

{p = &(*p) -> r} 
PRE-COND A (p 6 PNodeSet(&rt, p)) A isHBSTK(&rt, p) 

Similarly to the way we get EES we have: 

PRE-COND A (p G PNodeSet(&rt, p)) A isHBSTK(&rt, p) A *p ^ nilA 
(k < *y — > if) 

{p = &(*p) ^ Z} 
PRE-COND A (p 6 PNodeSet(&rt, p)) A isHBSTK(&rt, p) 



(30) 



(31) 



As *p ^ nil implies k<p^AVk>p^A. From the rule IF-ST, [30] and[3TJ 

PRE-COND A (p G PNodeSet(&rt, p)) A isHBSTK(&rt, p) A *p ^ nil 

{if (k < p -> A") p := &(*p) -> Z else p := &(*p) -> r; } (32) 
PRE-COND A (p G PNodeSet(&rt, p)) A isHBSTK(&rt, p) 

p G PNodeSet(&rt, p)) implies p ^ nil, thus *p = nil V *p ^ nil, From the rule 
WHILE-ST,021 



PRE-COND A (p G PNodeSet(&rt, p)) A isHBSTK(&rt, p) 

{the while statement} (33) 
PRE-COND A (p G PNodeSet(&rt, p)) A isHBSTK(&rt, p) A *p = nil 



From the rule CONSEQUENCE, [23 *p = nil, and Map(nil) = 0, we have 

PRE-COND A (p e PNodeSet(&rt, p)) A isHBSTK(&rt, p) 

{the while statement} (34) 
p G PNodeSet(&rt, p) A isHBSTK(&rt, p) A MapPP(&rt, p) = M 

From the rule ALLOC-ST and the fact that tmp is not relevant to any terms, 
we have: 



p e PNodeSet(&rt, p) A isHBSTK(&rt, p) A MapPP(&rt, p) = M 

{tmp = alloc(T); } 
p e PNodeSet(&rt, p) A isHBSTK(&rt, p) A MapPP(&rt, p) = M A 
tmp 5^ nil A InHeap(tmp) A Unique(&tmp) A Ptrlnit(tmp) 



p e PNodeSet(&rt, p) A isHBSTK(&rt, p) A MapPP(&rt, p) = Mo A 
isHBST(tmp) A Map(tmp) = {k i-> d} 

{*p := tmp; } 
isHBST(rt) A Map(rt) = M ]{k h-> d} 



(35) 



p G PNodeSet(&rt, p) A isHBSTK(&rt, p) A MapPP(&rt, p) = Mo A 
tmp 5^ nil A InHeap(tmp) A Unique(&tmp) A Ptrlnit(tmp) 

{tmp -> K := k;tmp -» D := d;} (36) 
p e PNodeSet(&rt, p) A isHBSTK(&rt, p) A MapPP(&rt, p) = M A 
isHBST(tmp) A Map(tmp) = {k i-> d} 

From the rule ASSIGN-ST, HOI and p £ {&p}, we have: 

p e PNodeSet(&rt, p) A isHBSTK(&rt, p) A MapPP(&rt, p) = M A 
isHBST(tmp) A Map(tmp) = {k i-> d} 

{*p:-tmp;} (37) 
p e PNodeSet(&rt, p) A isHBSTK(&rt, p) A MapPP(&rt, p) = Mo A 
isHBST(*p) A Map(*p) = {k h-> d} 

From the rule CONSEQUENCE, [24] and[25l we have: 



(38) 



From the rule SEQ-ST, and Hi [21 [Ml E3 [351 we prove the specification. 

PRE-COND{The Program}isHBST(rt) A Map(rt) = Mf{k ^ d} (39) 

B Verifying programs abstractly: the simplified 
Schorr- Waite algorithm 



The Schorr- Waite algorithm marks all nodes of a directed graph that are reach- 
able form one given node. The program depicted in Figure [5] is rewrite from a 



simplified version presented by David Gries[H]. The variables tmp, p, q, root, vroot 
are declared with type P(T), and T = REC((ra, integer) x (I, P(T)) x (r, P(T)). 
In this program, it is simplified that each node has exactly two non-nil pointers 
(i.e. the field I and r). We use this program to show how to verify a program in 
an abstract level. The verification presented here is just a sketch, many details 
are omitted. 



p=root; q=vroot; /*vroot — > I = vroot — > r — root*/ 

while(p 7^ vroot) 

{ 

p-tm = p-»tn + l; 

if (p — » m = 3 or (&p — > I) — » m = 0) 



{ 



} 

else 

{ 



tmp := p;p := p -> J; 

p^-Z = p->r;p-i-r:=q;q = tmp; 



tmp := p — > /; p — *• I := p — > r; 
p — » r := q; q := tmp 



Fig. 9. The simplified Schorr- Waite algorithm 



The DRFs used in (partial) specification and verification of this algorithm 
are depicted in Figure [TOl Intuitively speaking, the DRF StackPath(p) retrieve 
the path from the virtual root vroot to the current node p. Pred(x) is used to 
compute the predecessor of a node in the path. AcyclicSeq(x) is used to assert 
that the path retrieved by StackPath(p) is acyclic. 

Let G be the node set of the graph; L(p) for original value of p — > I; R(p) 
for original value of p — * r; SUCC(x) = (a; — * m = 1) ? R(x) : L(x). From [jj], the 
following invariant of the while statement holds. 



V.t & G ■ ( (i-»m = 0Ax-*I = L(x) Ai->r = R(x))V 

(nm=lAn! = R(x) A SUCC(a; -> r) = x)V 
(x -> to = 2 A SUCC(a; ^?) =2;A:z; ^ r = L(a;))V 
(a;->m = 3Ai->! = ■^( a; ) A x ^ r = R(x)) ) 

A( 

(p -» m - A (L(q) = p V i?(q) = p))V 
(p^m = lAq = L(p)) V(p^m = 2Aq = i?(p)) ) 
A AcyclicSeq(StackPath(p)) A p = head(StackPath(p)) 



(40) 



StackPath(x) : P(T) -» SeqOf(P(T)) 

= (a; = vroot) ? [vroot] : [a;pStackPath(Pred(a;))) 

Pred(:r) : P(T) -> P(T) 

= (a; ^ m = 0) ? q : ((x m = 1) ? :r r : £ ^ £) 

AcyclicSeq(:r) : SeqOf(P(T)) 

= head(:r) ^ tail(x) A AcyclicSeq(tail(a;)) 



Fig. 10. The functions denned to prove Schorr- Waite al- 
gorithm 



We write this invariant as INV. The following specifications of the body of 
the while statement can be proved. In these specifications, P and S are constants 
used to denote the original value of p and the path. 

INV A p -> m = A L(p) -> m = A StackPath(p) =5 Ap =p 

{The body of the while statement} (41) 

INVA P^ rn = 1 A StackPath(p) = L(p) ~ S 



INV Ap^m = lA i?(p) -> m = A StackPath(p) =5 Ap =P 

{The body of the while statement} (42) 

INVA m = 2 A StackPath(p) = R{p) ~ 5 

INV Ap-»ro = 2A StackPath(p) = p ~ 5 
{The body of the while statement} 

INV A p = R(p)A P^ m = 3 A StackPath(p) =5 

If we view StackPath(p) as a virtual variable, it can be seen that the body 
of the while statement have different pragmatic meanings when the value of 
p — ► nri equals to 0,1,2. Based on these properties, we can view the abstract 
program depicted in Figure [TT1 as an abstract version of the program in Figure O 
From this abstract level, it is clear that the program in Figure [9] is in fact an 
efficient and elaborative implementation of the depth-first-search algorithm. We 
can continue proving the algorithm based on this abstract program. Though 
assignment statements to abstract variables are not allowed in the code, the 
abstract program can help us thinking. 



(43) 



p— root; S=0; push(vroot, S); push(p,S); 
while (p 7^ vroot) do { 

p— nn = p — ► m + 1 ; 

if (p -> m = 1 A L(p) m = ) {push(i(p), S); } 
else if (p ->m = 2A i?(p) -> m = ) {push(i?(p), S); } 
else if (p — ► m = 3) {pop(S);} 
else skip 
P = top(S) 
} 



Fig. 11. The abstract version of the simplified Schorr- Waite algorithm 



